Objectives

library(tidyverse)
library(knitr)
library(broom)
library(stringr)
library(modelr)
library(forcats)
library(ggmap)
library(plotly)

options(digits = 3)
set.seed(1234)
theme_set(theme_minimal())

Geospatial visualization

Geospatial visualizations are some of the oldest data visualization methods in human existence. Data maps were first popularized in the seventeenth century and have grown in complexity and detail since then. Consider Google Maps, the sheer volume of data depicted, and the analytical pathways available to its users.

Geometric visualizations are used to depict spatial features, and with the incorporation of data reveal additional attributes and information. The main features of a map are defined by its scale (the proportion between distances and sizes on the map), its projection (how the three-dimensional Earth is represented on a two-dimensional surface), and its symbols (how data is depicted and visualized on the map).1

Map boundaries

Drawing maps in R is a layer-like process. Typically you start by drawing the boundaries of the geographic regions you wish to visualize, then you add additional layers defined by other variables:

  • Points
  • Symbols
  • Fills (choropleths)

Let’s start by first drawing a map’s boundaries. Later on we will fill these in with data and turn them into data visualizations.

Storing map boundaries

A geographic information system (GIS) is software that is “designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data”.2 There are many specialized software packages for spatial data analysis, many of which are commercial or proprietary software. For serious spatial data analysis tasks, you probably want to learn how to use these products. However for casual usage, R has a number of tools for drawing maps, most importantly ggplot2.

Using maps boundaries

The maps package includes the map() function for drawing maps based on bundled geodatabases using the graphics package:

library(maps)

# map of the world
map()

# usa boundaries
map("usa")

map("state")

# county map of illinois
map("county", "illinois")

These are fine, but we’d rather use them with our friendly ggplot2 library. We can do this by converting the geodatabases into data frames for plotting with ggplot2.

# map of the world
map_data("world") %>%
  as_tibble
## # A tibble: 99,338 × 6
##     long   lat group order region subregion
##  * <dbl> <dbl> <dbl> <int>  <chr>     <chr>
##  1 -69.9  12.5     1     1  Aruba      <NA>
##  2 -69.9  12.4     1     2  Aruba      <NA>
##  3 -69.9  12.4     1     3  Aruba      <NA>
##  4 -70.0  12.5     1     4  Aruba      <NA>
##  5 -70.1  12.5     1     5  Aruba      <NA>
##  6 -70.1  12.6     1     6  Aruba      <NA>
##  7 -70.0  12.6     1     7  Aruba      <NA>
##  8 -70.0  12.6     1     8  Aruba      <NA>
##  9 -69.9  12.5     1     9  Aruba      <NA>
## 10 -69.9  12.5     1    10  Aruba      <NA>
## # ... with 99,328 more rows
# usa boundaries
map_data("usa") %>%
  as_tibble
## # A tibble: 7,243 × 6
##     long   lat group order region subregion
##  * <dbl> <dbl> <dbl> <int>  <chr>     <chr>
##  1  -101  29.7     1     1   main      <NA>
##  2  -101  29.7     1     2   main      <NA>
##  3  -101  29.7     1     3   main      <NA>
##  4  -101  29.6     1     4   main      <NA>
##  5  -101  29.6     1     5   main      <NA>
##  6  -101  29.6     1     6   main      <NA>
##  7  -101  29.6     1     7   main      <NA>
##  8  -101  29.6     1     8   main      <NA>
##  9  -101  29.6     1     9   main      <NA>
## 10  -101  29.6     1    10   main      <NA>
## # ... with 7,233 more rows
map_data("state") %>%
  as_tibble
## # A tibble: 15,537 × 6
##     long   lat group order  region subregion
##  * <dbl> <dbl> <dbl> <int>   <chr>     <chr>
##  1 -87.5  30.4     1     1 alabama      <NA>
##  2 -87.5  30.4     1     2 alabama      <NA>
##  3 -87.5  30.4     1     3 alabama      <NA>
##  4 -87.5  30.3     1     4 alabama      <NA>
##  5 -87.6  30.3     1     5 alabama      <NA>
##  6 -87.6  30.3     1     6 alabama      <NA>
##  7 -87.6  30.3     1     7 alabama      <NA>
##  8 -87.6  30.3     1     8 alabama      <NA>
##  9 -87.7  30.3     1     9 alabama      <NA>
## 10 -87.8  30.3     1    10 alabama      <NA>
## # ... with 15,527 more rows
# county map of illinois
map_data("county", "illinois") %>%
  as_tibble
## # A tibble: 1,697 × 6
##     long   lat group order   region subregion
##  * <dbl> <dbl> <dbl> <int>    <chr>     <chr>
##  1 -91.5  40.2     1     1 illinois     adams
##  2 -90.9  40.2     1     2 illinois     adams
##  3 -90.9  40.2     1     3 illinois     adams
##  4 -90.9  40.1     1     4 illinois     adams
##  5 -90.9  39.8     1     5 illinois     adams
##  6 -90.9  39.8     1     6 illinois     adams
##  7 -91.4  39.8     1     7 illinois     adams
##  8 -91.4  39.8     1     8 illinois     adams
##  9 -91.4  39.9     1     9 illinois     adams
## 10 -91.5  39.9     1    10 illinois     adams
## # ... with 1,687 more rows

map_data() converts the geodatabases into data frames where each row is a single point on the map. The resulting data frame contains the following variables:

  • long - longitude. Things to the west of the prime meridian are negative
  • lat - latitude
  • order - this identifies the order ggplot() should follow to “connect the dots” and draw the borders
  • region and subregion identify what region or subregion a set of points surrounds
  • group - this is perhaps the most important variable in the data frame. ggplot() uses the group aesthetic to determine whether adjacent points should be connected by a line. If they are in the same group, then the points are connected. If not, then they are not connected. This is the same basic principle for standard geom_line() plots:

    library(gapminder)
    
    # no group aesthetic
    ggplot(gapminder, aes(year, lifeExp)) +
      geom_line()

    # with grouping by country
    ggplot(gapminder, aes(year, lifeExp, group = country)) +
      geom_line()

    Note that group is not the same thing as region or subregion. If a region contains landmasses that are discontiguous, there should be multiple groups to properly draw the region:

    map("state", "michigan")

    ### Drawing the United States

Let’s draw a map of the United States. First we need to store the USA map boundaries in a data frame:

usa <- map_data("usa") %>%
  as_tibble
usa
## # A tibble: 7,243 × 6
##     long   lat group order region subregion
##  * <dbl> <dbl> <dbl> <int>  <chr>     <chr>
##  1  -101  29.7     1     1   main      <NA>
##  2  -101  29.7     1     2   main      <NA>
##  3  -101  29.7     1     3   main      <NA>
##  4  -101  29.6     1     4   main      <NA>
##  5  -101  29.6     1     5   main      <NA>
##  6  -101  29.6     1     6   main      <NA>
##  7  -101  29.6     1     7   main      <NA>
##  8  -101  29.6     1     8   main      <NA>
##  9  -101  29.6     1     9   main      <NA>
## 10  -101  29.6     1    10   main      <NA>
## # ... with 7,233 more rows

Simple black map

  • We can use geom_polygon() to draw lines between points and close them up (connect the last point with the first point)
  • x = long and y = lat
  • We map the group aesthetic to the group column
ggplot() +
  geom_polygon(data = usa, aes(x = long, y = lat, group = group))

Ta-da! A few things we want to immediately start thinking about. First, because latitude and longitude have absolute relations to one another, we need to fix the aspect ratio so that we don’t accidentially compress the graph in one dimension. Fixing the aspect ratio also means that even if you change the outer dimensions of the graph (i.e. adjust the window size), then the aspect ratio of the graph itself remains unchanged. We can do this using the coord_fixed() function:

ggplot() +
  geom_polygon(data = usa, aes(x = long, y = lat, group = group)) +
  coord_fixed()

Now it looks a little squished. You can play around with the aspect ratio to find a better projection:3

ggplot() +
  geom_polygon(data = usa, aes(x = long, y = lat, group = group)) +
  coord_fixed(1.3)

Change the colors

Since this is a ggplot() object, we can change the fill and color aesthetics for the map:

ggplot() +
  geom_polygon(data = usa, aes(x = long, y = lat, group = group),
               fill = NA, color = "red") + 
  coord_fixed(1.3)

gg1 <- ggplot() + 
  geom_polygon(data = usa, aes(x = long, y = lat, group = group),
               fill = "violet", color = "blue") + 
  coord_fixed(1.3)
gg1

Always remember to use the group aesthetic

What happens if we plot the map without the group aesthetic?

ggplot() + 
  geom_polygon(data = usa, aes(x = long, y = lat),
               fill = "violet", color = "blue") + 
  coord_fixed(1.3)

Oops. The map doesn’t connect the dots in the correct order.

State maps

maps also comes with a state boundaries geodatabase:

states <- map_data("state") %>%
  as_tibble()
states
## # A tibble: 15,537 × 6
##     long   lat group order  region subregion
##  * <dbl> <dbl> <dbl> <int>   <chr>     <chr>
##  1 -87.5  30.4     1     1 alabama      <NA>
##  2 -87.5  30.4     1     2 alabama      <NA>
##  3 -87.5  30.4     1     3 alabama      <NA>
##  4 -87.5  30.3     1     4 alabama      <NA>
##  5 -87.6  30.3     1     5 alabama      <NA>
##  6 -87.6  30.3     1     6 alabama      <NA>
##  7 -87.6  30.3     1     7 alabama      <NA>
##  8 -87.6  30.3     1     8 alabama      <NA>
##  9 -87.7  30.3     1     9 alabama      <NA>
## 10 -87.8  30.3     1    10 alabama      <NA>
## # ... with 15,527 more rows

By default, each state is filled with the same color:

ggplot(data = states) + 
  geom_polygon(aes(x = long, y = lat, group = group), color = "white") + 
  coord_fixed(1.3)

We can adjust this by using the fill aesthetic. Here, let’s map region to fill:

ggplot(data = states) + 
  geom_polygon(aes(x = long, y = lat, fill = region, group = group), color = "white") + 
  coord_fixed(1.3) +
  # turn off color legend
  theme(legend.position = "none")

Here, each state is assigned a different color at random. You can start to imagine how we might build a choropleth by mapping a different variable to fill, but we’ll return to that in a little bit.

Plot a subset of states

We can use filter() to subset the states data frame and draw a map with only a subset of the states. For example, if we want to only graph states in the Midwest:

midwest <- subset(states, region %in% c("illinois", "indiana", "iowa",
                                        "kansas", "michigan", "minnesota",
                                        "missouri", "nebraska", "north dakota",
                                        "ohio", "south dakota", "wisconsin"))

ggplot(data = midwest) + 
  geom_polygon(aes(x = long, y = lat, group = group),
               fill = "palegreen", color = "black") + 
  coord_fixed(1.3)

Zoom in on Illinois and its counties

First let’s get the Illinois boundaries:

il_df <- filter(states, region == "illinois")

Now let’s get the accompanying counties:

counties <- map_data("county") %>%
  as_tibble
il_county <- filter(counties, region == "illinois")
il_county
## # A tibble: 1,696 × 6
##     long   lat group order   region subregion
##    <dbl> <dbl> <dbl> <int>    <chr>     <chr>
##  1 -91.5  40.2   561 21883 illinois     adams
##  2 -90.9  40.2   561 21884 illinois     adams
##  3 -90.9  40.1   561 21885 illinois     adams
##  4 -90.9  39.8   561 21886 illinois     adams
##  5 -90.9  39.8   561 21887 illinois     adams
##  6 -91.4  39.8   561 21888 illinois     adams
##  7 -91.4  39.8   561 21889 illinois     adams
##  8 -91.4  39.9   561 21890 illinois     adams
##  9 -91.5  39.9   561 21891 illinois     adams
## 10 -91.4  39.9   561 21892 illinois     adams
## # ... with 1,686 more rows

Plot the state first. This time, lets’ remove all the axes gridlines and background junk using theme_void():

il_base <- ggplot(data = il_df, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "gray")

il_base +
  theme_void()

Now let’s plot the county boundaries in white:

il_base +
  theme_void() + 
  geom_polygon(data = il_county, fill = NA, color = "white") +
  geom_polygon(color = "black", fill = NA)  # get the state border back on top

But what about Alaska and Hawaii?

If you were observant, you noticed map_data("states") only includes the 48 contiguous states in the United States. This is because Alaska and Hawaii exist far off from the rest of the states. What happens if you try to draw a map with them in it?

Yup, that doesn’t look right. Most maps of the United States place Alaska and Hawaii as insets to the south of California. Until recently, in R this was an extremely tedious task that required manually changing the latitude and longitude coordinates for these states to place them in the correct location. Fortunately a new package is available that has already done the work for you. fiftystater includes the fifty_states data frame which contains adjusted coordinates for Alaska and Hawaii to plot them with the mainland:

library(fiftystater)

data("fifty_states")
fifty_states %>%
  as_tibble
## # A tibble: 13,694 × 7
##     long   lat order  hole  piece      id     group
##    <dbl> <dbl> <int> <lgl> <fctr>   <chr>    <fctr>
##  1 -85.1  32.0     1 FALSE      1 alabama Alabama.1
##  2 -85.1  31.9     2 FALSE      1 alabama Alabama.1
##  3 -85.1  31.9     3 FALSE      1 alabama Alabama.1
##  4 -85.1  31.8     4 FALSE      1 alabama Alabama.1
##  5 -85.1  31.8     5 FALSE      1 alabama Alabama.1
##  6 -85.1  31.7     6 FALSE      1 alabama Alabama.1
##  7 -85.1  31.7     7 FALSE      1 alabama Alabama.1
##  8 -85.1  31.7     8 FALSE      1 alabama Alabama.1
##  9 -85.1  31.6     9 FALSE      1 alabama Alabama.1
## 10 -85.0  31.6    10 FALSE      1 alabama Alabama.1
## # ... with 13,684 more rows
ggplot(data = fifty_states, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "gray")

Using shapefiles

maps contains a very limited number of geodatabases. If you want to import a different country’s borders or some other geographic information, you will likely find the data in a shapefile. This is a special file format that encodes points, lines, and polygons in geographic space. Files appear with a .shp extension, sometimes with accompanying files ending in .dbf and .prj.

  • .shp stores the geographic coordinates of the geographic features (e.g. country, state, county)
  • .dbf stores data associated with the geographic features (e.g. unemployment rate, crime rates, percentage of votes cast for Donald Trump)
  • .prj stores information about the projection of the coordinates in the shapefile (we’ll handle this shortly)

Let’s start with a shapefile for state boundaries in the United States.4 We’ll use readOGR() from the rgdal package to read in the data file:

library(rgdal)

usa <- readOGR("data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp", layer: "cb_2013_us_state_20m"
## with 52 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
str(usa, max.level = 2)
## Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  52 obs. of  9 variables:
##   ..@ polygons   :List of 52
##   ..@ plotOrder  : int [1:52] 32 48 29 3 18 46 33 22 34 51 ...
##   ..@ bbox       : num [1:2, 1:2] -179.1 17.9 179.8 71.4
##   .. ..- attr(*, "dimnames")=List of 2
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot

This is decidedly not a tidy data frame. Once you import the shapefile, it’s best to convert it to a data frame for ggplot(). We can do this using fortify():

usa %>%
  fortify() %>%
  head
##    long  lat order  hole piece id group
## 1 -88.5 31.9     1 FALSE     1  0   0.1
## 2 -88.5 31.9     2 FALSE     1  0   0.1
## 3 -88.5 31.9     3 FALSE     1  0   0.1
## 4 -88.5 31.9     4 FALSE     1  0   0.1
## 5 -88.5 32.0     5 FALSE     1  0   0.1
## 6 -88.5 32.0     6 FALSE     1  0   0.1

Under this approach, the id variable is just a number assigned to each region (in this case, each state/territory). However the shapefile contains linked data with attributes for each region. We can access this using the @data accessor:

usa@data %>%
  as_tibble
## # A tibble: 52 × 9
##    STATEFP  STATENS    AFFGEOID  GEOID STUSPS        NAME   LSAD
##  *  <fctr>   <fctr>      <fctr> <fctr> <fctr>      <fctr> <fctr>
##  1      01 01779775 0400000US01     01     AL     Alabama     00
##  2      05 00068085 0400000US05     05     AR    Arkansas     00
##  3      06 01779778 0400000US06     06     CA  California     00
##  4      09 01779780 0400000US09     09     CT Connecticut     00
##  5      12 00294478 0400000US12     12     FL     Florida     00
##  6      13 01705317 0400000US13     13     GA     Georgia     00
##  7      16 01779783 0400000US16     16     ID       Idaho     00
##  8      17 01779784 0400000US17     17     IL    Illinois     00
##  9      18 00448508 0400000US18     18     IN     Indiana     00
## 10      20 00481813 0400000US20     20     KS      Kansas     00
## # ... with 42 more rows, and 2 more variables: ALAND <fctr>, AWATER <fctr>

We can keep these variables in the new data frame through parameters to fortify(region = ""):

# state name
usa %>%
  fortify(region = "NAME") %>%
  head
##    long  lat order  hole piece      id     group
## 1 -88.5 31.9     1 FALSE     1 Alabama Alabama.1
## 2 -88.5 31.9     2 FALSE     1 Alabama Alabama.1
## 3 -88.5 31.9     3 FALSE     1 Alabama Alabama.1
## 4 -88.5 31.9     4 FALSE     1 Alabama Alabama.1
## 5 -88.5 32.0     5 FALSE     1 Alabama Alabama.1
## 6 -88.5 32.0     6 FALSE     1 Alabama Alabama.1
# FIPS code
usa %>%
  fortify(region = "STATEFP") %>%
  head
##    long  lat order  hole piece id group
## 1 -88.5 31.9     1 FALSE     1 01  01.1
## 2 -88.5 31.9     2 FALSE     1 01  01.1
## 3 -88.5 31.9     3 FALSE     1 01  01.1
## 4 -88.5 31.9     4 FALSE     1 01  01.1
## 5 -88.5 32.0     5 FALSE     1 01  01.1
## 6 -88.5 32.0     6 FALSE     1 01  01.1
# keep it all
(usa2 <- usa %>%
  fortify(region = "NAME") %>%
  as_tibble %>%
  left_join(usa@data, by = c("id" = "NAME")))
## # A tibble: 46,174 × 15
##     long   lat order  hole  piece      id     group STATEFP  STATENS
##    <dbl> <dbl> <int> <lgl> <fctr>   <chr>    <fctr>  <fctr>   <fctr>
##  1 -88.5  31.9     1 FALSE      1 Alabama Alabama.1      01 01779775
##  2 -88.5  31.9     2 FALSE      1 Alabama Alabama.1      01 01779775
##  3 -88.5  31.9     3 FALSE      1 Alabama Alabama.1      01 01779775
##  4 -88.5  31.9     4 FALSE      1 Alabama Alabama.1      01 01779775
##  5 -88.5  32.0     5 FALSE      1 Alabama Alabama.1      01 01779775
##  6 -88.5  32.0     6 FALSE      1 Alabama Alabama.1      01 01779775
##  7 -88.5  32.1     7 FALSE      1 Alabama Alabama.1      01 01779775
##  8 -88.4  32.2     8 FALSE      1 Alabama Alabama.1      01 01779775
##  9 -88.4  32.2     9 FALSE      1 Alabama Alabama.1      01 01779775
## 10 -88.4  32.2    10 FALSE      1 Alabama Alabama.1      01 01779775
## # ... with 46,164 more rows, and 6 more variables: AFFGEOID <fctr>,
## #   GEOID <fctr>, STUSPS <fctr>, LSAD <fctr>, ALAND <fctr>, AWATER <fctr>

FIPS codes for each state and outlying territory.

Now we can plot it like normal using ggplot():

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "gray")

Because the data file comes from the Census Bureau, we also get boundaries for Alaska, Hawaii, and Puerto Rico. To remove them from the data, just use filter():

usa2 <- usa2 %>%
  filter(id != "Alaska", id != "Hawaii", id != "Puerto Rico")

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  coord_fixed(1.3) + 
  geom_polygon(color = "black", fill = "gray")

ggmap

Rather than relying on geodatabases or shapefiles which store boundaries as numeric data, we can use ggmap to retrieve raster map tiles from online mapping services.

library(ggmap)
get_stamenmap(c(left = -87.6189, bottom = 41.7723, right = -87.5721, top = 41.8107), zoom = 14) %>%
  ggmap()    # NOTE: this will generate an error with ggplot2 v.2.2.1+

get_openstreetmap(bbox = c(left = -87.6189, bottom = 41.7723, right = -87.5721, top = 41.8107)) %>%
  ggmap()
## Error: map grabbing failed - see details in ?get_openstreetmap.

OpenStreetMap API can be flaky. Sometimes you have to come back later and try running get_openstreetmap() to properly download the map tiles.

get_googlemap("university of chicago", zoom = 12) %>%
  ggmap()

Changing map projections

As you saw in The Truthful Art, representing portions of the globe on a flat surface can be challenging. Depending on how you project the map, you can distort or emphasize certain features of the map. Fortunately, ggplot() includes the coord_map() function which allows us to easily implement different projection methods.5 Depending on the projection method, you may need to pass additional arguments to coord_map() to define the standard parallel lines used in projecting the map:

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map() +
  ggtitle("Mercator projection (default)")

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
  ggtitle("Albers equal-area projection")

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map(projection = "lambert", lat0 = 25, lat1 = 50) +
  ggtitle("Lambert equal-area projection")

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map(projection = "conic", lat0 = 40) +
  ggtitle("Conic projection")

ggplot(data = usa2, mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map(projection = "mollweide") +
  ggtitle("Mollweide projection")

ggplot(data = map_data("world"), mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map(projection = "mollweide", xlim = c(-180, 180)) +
  ggtitle("Mollweide projection")

Adding data to the map

Region boundaries serve as the background in geospatial data visualization - so now we need to add data. Some types of channels (points and symbols) are overlaid on top of the boundaries, whereas other channels (fill) are incorporated into the region layer itself. Let’s look at the first set of channels.

Points

Let’s use our usa2 map data to add some points. The airports data frame in the nycflights13 package includes geographic info on airports in the United States.

library(nycflights13)
airports
## # A tibble: 1,458 × 8
##      faa                           name   lat    lon   alt    tz   dst
##    <chr>                          <chr> <dbl>  <dbl> <int> <dbl> <chr>
##  1   04G              Lansdowne Airport  41.1  -80.6  1044    -5     A
##  2   06A  Moton Field Municipal Airport  32.5  -85.7   264    -6     A
##  3   06C            Schaumburg Regional  42.0  -88.1   801    -6     A
##  4   06N                Randall Airport  41.4  -74.4   523    -5     A
##  5   09J          Jekyll Island Airport  31.1  -81.4    11    -5     A
##  6   0A9 Elizabethton Municipal Airport  36.4  -82.2  1593    -5     A
##  7   0G6        Williams County Airport  41.5  -84.5   730    -5     A
##  8   0G7  Finger Lakes Regional Airport  42.9  -76.8   492    -5     A
##  9   0P2   Shoestring Aviation Airfield  39.8  -76.6  1000    -5     U
## 10   0S9          Jefferson County Intl  48.1 -122.8   108    -8     A
## # ... with 1,448 more rows, and 1 more variables: tzone <chr>

Each airport has it’s geographic location encoded through lat and lon. To draw these points on the map, basically we draw a scatterplot with x = lon and y = lat. In fact we could simply do that:

ggplot(airports, aes(lon, lat)) +
  geom_point()

Let’s overlay it with the mapped state borders:

ggplot() + 
  coord_map() + 
  geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
               color = "black", fill = "gray") +
  geom_point(data = airports, aes(x = lon, y = lat), shape = 1)

Slight problem. We have airports listed outside of the continental United States. There are a couple ways to rectify this. Unfortunately airports does not include a variable identifying state so the filter() operation is not that simple. The easiest solution is to crop the limits of the graph to only show the mainland:

ggplot() + 
  coord_map(xlim = c(-130, -60),
            ylim = c(20, 50)) + 
  geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
               color = "black", fill = "gray") +
  geom_point(data = airports, aes(x = lon, y = lat), shape = 1)

If we want to change the projection method, the points will automatically adjust too:

ggplot() + 
  coord_map(projection = "albers", lat0 = 25, lat1 = 50,
            xlim = c(-130, -60),
            ylim = c(20, 50)) + 
  geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
               color = "black", fill = "gray") +
  geom_point(data = airports, aes(x = lon, y = lat), shape = 1)

Symbols

We can change the size or type of symbols on the map. For instance, we can draw a bubble plot (also known as a proportional symbol map) and encode the altitude of the airport through the size channel:

ggplot() + 
  coord_map(xlim = c(-130, -60),
            ylim = c(20, 50)) + 
  geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
               color = "black", fill = "white") +
  geom_point(data = airports, aes(x = lon, y = lat, size = alt),
             fill = "grey", color = "black", alpha = .2) +
  theme_void() +
  theme(legend.position = "none")

Circle area is proportional to the airport’s altitude (in feet). Or we could scale it based on the number of arriving flights in flights:

airports_n <- flights %>%
  count(dest) %>%
  left_join(airports, by = c("dest" = "faa"))

ggplot() + 
  coord_map(xlim = c(-130, -60),
            ylim = c(20, 50)) + 
  geom_polygon(data = usa2, mapping = aes(x = long, y = lat, group = group),
               color = "black", fill = "white") +
  geom_point(data = airports_n, aes(x = lon, y = lat, size = n),
             fill = "grey", color = "black", alpha = .2) +
  theme_void() +
  theme(legend.position = "none")

airports contains a list of virtually all commercial airports in the United States. However flights only contains data on flights departing from New York City airports (JFK, LaGuardia, or Newark) and only services a few airports around the country.

Drawing choropleth maps

Choropleth maps encode information by assigning shades of colors to defined areas on a map (e.g. countries, states, counties, zip codes). There are lots of ways to tweak and customize these graphs, which is generally a good idea because remember that color is one of the harder-to-decode channels.

Loading the data

We’ll continue to use the usa2 shapefile. Let’s reload it and also load and tidy the county shapefile:

usa <- readOGR("data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "data/census_bureau/cb_2013_us_state_20m/cb_2013_us_state_20m.shp", layer: "cb_2013_us_state_20m"
## with 52 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
usa2 <- usa %>%
  fortify(region = "GEOID") %>%
  as_tibble %>%
  left_join(usa@data, by = c("id" = "GEOID")) %>%
  # filter out Alaska, Hawaii, Puerto Rico via FIPS codes
  filter(!(STATEFP %in% c("02", "15", "72")))

counties <- readOGR("data/census_bureau/cb_2013_us_county_20m/cb_2013_us_county_20m.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "data/census_bureau/cb_2013_us_county_20m/cb_2013_us_county_20m.shp", layer: "cb_2013_us_county_20m"
## with 3221 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
counties2 <- counties %>%
  fortify(region = "GEOID") %>%
  as_tibble %>%
  left_join(counties@data, by = c("id" = "GEOID")) %>%
  # filter out Alaska, Hawaii, Puerto Rico via FIPS codes
  filter(!(STATEFP %in% c("02", "15", "72")))

ggplot(counties2, mapping = aes(x = long, y = lat, group = group)) + 
  geom_polygon(color = "black", fill = "gray") +
  coord_map()

We’ll draw choropleths for the number of foreign-born individuals in each region (state or county). We can get those files from the census_bureau folder. Let’s also normalize our measure by the total population to get the rate of foreign-born individuals in the population:

(fb_state <- read_csv("data/census_bureau/ACS_13_5YR_B05012_state/ACS_13_5YR_B05012.csv") %>%
  mutate(rate = HD01_VD03 / HD01_VD01))
## # A tibble: 51 × 10
##         GEO.id GEO.id2  `GEO.display-label` HD01_VD01 HD02_VD01 HD01_VD02
##          <chr>   <chr>                <chr>     <int>     <chr>     <int>
##  1 0400000US01      01              Alabama   4799277      <NA>   4631045
##  2 0400000US02      02               Alaska    720316      <NA>    669941
##  3 0400000US04      04              Arizona   6479703      <NA>   5609835
##  4 0400000US05      05             Arkansas   2933369      <NA>   2799972
##  5 0400000US06      06           California  37659181      <NA>  27483342
##  6 0400000US08      08             Colorado   5119329      <NA>   4623809
##  7 0400000US09      09          Connecticut   3583561      <NA>   3096374
##  8 0400000US10      10             Delaware    908446      <NA>    831683
##  9 0400000US11      11 District of Columbia    619371      <NA>    534142
## 10 0400000US12      12              Florida  19091156      <NA>  15392410
## # ... with 41 more rows, and 4 more variables: HD02_VD02 <int>,
## #   HD01_VD03 <int>, HD02_VD03 <int>, rate <dbl>
(fb_county <- read_csv("data/census_bureau/ACS_13_5YR_B05012_county/ACS_13_5YR_B05012.csv") %>%
  mutate(rate = HD01_VD03 / HD01_VD01))
## # A tibble: 3,143 × 10
##            GEO.id GEO.id2      `GEO.display-label` HD01_VD01 HD02_VD01
##             <chr>   <chr>                    <chr>     <int>     <int>
##  1 0500000US01001   01001  Autauga County, Alabama     54907        NA
##  2 0500000US01003   01003  Baldwin County, Alabama    187114        NA
##  3 0500000US01005   01005  Barbour County, Alabama     27321        NA
##  4 0500000US01007   01007     Bibb County, Alabama     22754        NA
##  5 0500000US01009   01009   Blount County, Alabama     57623        NA
##  6 0500000US01011   01011  Bullock County, Alabama     10746        NA
##  7 0500000US01013   01013   Butler County, Alabama     20624        NA
##  8 0500000US01015   01015  Calhoun County, Alabama    117714        NA
##  9 0500000US01017   01017 Chambers County, Alabama     34145        NA
## 10 0500000US01019   01019 Cherokee County, Alabama     26034        NA
## # ... with 3,133 more rows, and 5 more variables: HD01_VD02 <int>,
## #   HD02_VD02 <int>, HD01_VD03 <int>, HD02_VD03 <int>, rate <dbl>

Joining the data to regions

Now that we have our data, we want to draw it on the map. To do that, we have to join together our data sources - the shapefiles and the CSVs. Normally joining data files requires a _join() operation of some sort. However when using ggplot2, we don’t have to do this. Remember that we can pass different data frames into different layers of a ggplot() object. Rather than using geom_polygon() to draw our maps, now we switch to geom_map():

ggplot(fb_state, aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate), map = usa2) +
  expand_limits(x = usa2$long, y = usa2$lat)

Let’s break down what just happened:

  • fb_state is the data frame with the variables we want to visualize
  • map_id = GEO.id2 identifies the column in fb_state that uniquely matches each observation to a region in usa2
  • geom_map(aes(fill = rate), map = usa2)
    • fill = rate identifies the column in fb_state we will use to determine the color of each region
    • map = usa2 is the data frame containing the boundary coordinates
  • expand_limits(x = usa2$long, y = usa2$lat) ensures the graph is drawn to the proper window. Because the default data frame for this ggplot() object is fb_state, it won’t contain the necessary information to size the window

We can then tweak this up by adding a title, removing the background (but retaining the legend), and projecting the map using a different method:

ggplot(fb_state, aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate), map = usa2) +
  expand_limits(x = usa2$long, y = usa2$lat) +
  scale_fill_continuous(labels = scales::percent) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)

We could do the same thing for the county-level data:

ggplot(fb_county, aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate), map = counties2) +
  expand_limits(x = counties2$long, y = counties2$lat) +
  scale_fill_continuous(labels = scales::percent) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)

Binning data

  • cut_interval() - makes n groups with equal range
fb_county %>%
  mutate(rate_cut = cut_interval(rate, 6)) %>%
  ggplot(aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate_cut), map = counties2) +
  expand_limits(x = counties2$long, y = counties2$lat) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)

  • cut_number() - makes n groups with (approximately) equal numbers of observations
fb_county %>%
  mutate(rate_cut = cut_number(rate, 6)) %>%
  ggplot(aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate_cut), map = counties2) +
  expand_limits(x = counties2$long, y = counties2$lat) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)

Defining colors

Recall that Cleveland and McGill identify the color channel as one of the most difficult channels for humans to properly decode and interpret. Selection of your color palette is perhaps the most important decision to make when drawing a choropleth.

By default, ggplot2 picks evenly spaced hues around the Hue-Chroma-Luminance (HCL) color space:6

# generate simulated data points
sim_points <- data_frame(x = factor(1:6))

plots <- purrr::map(1:6, ~ ggplot(sim_points[1:.x, ], aes(x, x, color = x)) +
  geom_point(size = 5) +
    ggtitle(paste(.x, "color")) +
  theme(legend.position = "none"))

gridExtra::marrangeGrob(plots, nrow = 2, ncol = 3, top = NULL)

ggplot2 gives you many different ways of defining and customizing your scale_color_ and scale_fill_ palettes, but will not tell you if they are optimal for your specific usage in the graph.

RColorBrewer

Color Brewer is a diagnostic tool for selecting optimal color palettes for maps with discrete variables. The authors have generated different color palettes designed to make differentiating between categories easy depending on the scaling of your variable. All you need to do is define the number of categories in the variable, the nature of your data (sequential, diverging, or qualitative), and a color scheme. There are also options to select palettes that are colorblind safe, print friendly, and photocopy safe. Depending on the combination of options, you may not find any color palette that matches your criteria. In such a case, consider reducing the number of data classes.

Sequential

fb_county %>%
  mutate(rate_cut = cut_number(rate, 6)) %>%
  ggplot(aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate_cut), map = counties2) +
  expand_limits(x = counties2$long, y = counties2$lat) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
  scale_fill_brewer(palette = "BuGn")

fb_county %>%
  mutate(rate_cut = cut_number(rate, 6)) %>%
  ggplot(aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate_cut), map = counties2) +
  expand_limits(x = counties2$long, y = counties2$lat) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
  scale_fill_brewer(palette = "YlGn")

fb_county %>%
  mutate(rate_cut = cut_number(rate, 6)) %>%
  ggplot(aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate_cut), map = counties2) +
  expand_limits(x = counties2$long, y = counties2$lat) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50) +
  scale_fill_brewer(palette = "Blues")

Qualitative

state_data <- data_frame(name = state.name,
                         region = state.region,
                         subregion = state.division,
                         abb = state.abb) %>%
  bind_cols(as_tibble(state.x77)) %>%
  # get id variable into data frame
  left_join(usa2 %>%
              select(id, NAME) %>%
              distinct,
            by = c("name" = "NAME")) %>%
  # remove Alaska and Hawaii
  na.omit

# set region base plot
region_p <- ggplot(state_data, aes(map_id = id)) +
  geom_map(aes(fill = region), map = usa2) +
  expand_limits(x = usa2$long, y = usa2$lat) +
  labs(fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)
region_p

# try different color brewers
region_p +
  scale_fill_brewer(palette = "Paired")

region_p +
  scale_fill_brewer(palette = "Dark2")

region_p +
  scale_fill_brewer(palette = "Pastel2")

# set subregion base plot
subregion_p <- ggplot(state_data, aes(map_id = id)) +
  geom_map(aes(fill = subregion), map = usa2) +
  expand_limits(x = usa2$long, y = usa2$lat) +
  labs(fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)
subregion_p

subregion_p +
  scale_fill_brewer(palette = "Paired")

subregion_p +
  scale_fill_brewer(palette = "Set1")

subregion_p +
  scale_fill_brewer(palette = "Pastel1")

Faceting maps

Get world fertility rate data

# Shapefile
world <- readOGR("data/nautral_earth/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "data/nautral_earth/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp", layer: "ne_110m_admin_0_countries"
## with 177 features
## It has 63 fields
world2 <- fortify(world, region = "iso_a3")
 
# Country-level data
fertility <- read_csv("data/API_SP.DYN.TFRT.IN_DS2_en_csv_v2/API_SP.DYN.TFRT.IN_DS2_en_csv_v2.csv") %>%
  select(-X62) %>%
  # tidy the data frame
  gather(year, fertility, `1960`:`2016`, convert = TRUE) %>%
  filter(year < 2015) %>%
  mutate(fertility = as.numeric(fertility),
         # cut into six intervals
         fertility_rate = cut_interval(fertility, 6))

For loop

# use a for loop
for(year in c(1970, 2010)){
  p <- fertility %>%
    filter(year == year) %>%
    ggplot(aes(map_id = `Country Code`)) +
    geom_map(aes(fill = fertility_rate), map = world2) +
    expand_limits(x = world2$long, y = world2$lat) +
    scale_fill_brewer(palette = "BuGn") +
    labs(title = "Fertility rate",
         fill = NULL) +
    ggthemes::theme_map() +
    coord_map(projection = "mollweide", xlim = c(-180, 180))
  print(p)
}

purrr::map()

purrr::map(c(1970, 2010), ~ fertility %>%
    filter(year == .x) %>%
    ggplot(aes(map_id = `Country Code`)) +
    geom_map(aes(fill = fertility_rate), map = world2) +
    expand_limits(x = world2$long, y = world2$lat) +
    scale_fill_brewer(palette = "BuGn") +
    labs(title = "Fertility rate",
         fill = NULL) +
    ggthemes::theme_map() +
    coord_map(projection = "mollweide", xlim = c(-180, 180)))
## [[1]]

## 
## [[2]]

facet_wrap()

ggplot(fertility, aes(map_id = `Country Code`)) +
  facet_wrap(~ year) +
  geom_map(aes(fill = fertility_rate), map = world2) +
  expand_limits(x = world2$long, y = world2$lat) +
  scale_fill_brewer(palette = "BuGn") +
  labs(title = "Fertility rate",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "mollweide", xlim = c(-180, 180)) +
  theme(legend.position = "none")

south_africa <- world2 %>%
  filter(id == "ZAF")

fertility %>%
  filter(`Country Name` == "South Africa") %>%
  ggplot(aes(map_id = `Country Code`)) +
  facet_wrap(~ year) +
  geom_map(aes(fill = fertility_rate), map = south_africa) +
  expand_limits(x = south_africa$long, y = south_africa$lat) +
  scale_fill_brewer(palette = "BuGn") +
  labs(title = "Fertility rate",
       subtitle = "South Africa",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map() +
  theme(legend.position = "none")

gganimate

Use the frame aesthetic.

library(gganimate)

p <- ggplot(fertility, aes(map_id = `Country Code`, frame = year)) +
  geom_map(aes(fill = fertility_rate), map = world2) +
  expand_limits(x = world2$long, y = world2$lat) +
  scale_fill_brewer(palette = "BuGn") +
  labs(title = "Fertility rate",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "mollweide", xlim = c(-180, 180)) +
  theme(legend.position = "none")
gg_animate(p)

Making maps interactive

plotly

ggplotly()

p <- ggplot(fb_state, aes(map_id = GEO.id2)) +
  geom_map(aes(fill = rate), map = usa2) +
  expand_limits(x = usa2$long, y = usa2$lat) +
  scale_fill_continuous(labels = scales::percent) +
  labs(title = "Rate of foreign-born individuals in the population",
       fill = NULL) +
  ggthemes::theme_map() +
  coord_map(projection = "albers", lat0 = 25, lat1 = 50)

ggplotly(p)

Unfortunately you lose the Albers projection.

plot_ly()

# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)

# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

p <- fb_state %>%
  # get state abbrev for matching to geodatabase
  left_join(state_data, by = c("GEO.display-label" = "name")) %>%
  mutate(rate = rate * 100,
         hover = paste("Foreign-born rate:", rate)) %>%
  plot_geo(locationmode = 'USA-states') %>%
  add_trace(
    z = ~rate, text = ~hover, locations = ~abb,
    color = ~rate, colors = 'Purples'
  ) %>%
  colorbar(title = "Percentage") %>%
  layout(
    title = "Rate of foreign-born individuals in the population",
    geo = g
  )
p

leaflet

See Leaflet for R for more information.

Session Info

devtools::session_info()
##  setting  value                       
##  version  R version 3.3.3 (2017-03-06)
##  system   x86_64, darwin13.4.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2017-05-08                  
## 
##  package      * version    date       source                           
##  assertthat     0.2.0      2017-04-11 cran (@0.2.0)                    
##  backports      1.0.5      2017-01-18 CRAN (R 3.3.2)                   
##  bitops         1.0-6      2013-08-17 CRAN (R 3.3.0)                   
##  broom        * 0.4.2      2017-02-13 CRAN (R 3.3.2)                   
##  codetools      0.2-15     2016-10-05 CRAN (R 3.3.3)                   
##  colorspace     1.3-2      2016-12-14 CRAN (R 3.3.2)                   
##  DBI            0.6        2017-03-09 CRAN (R 3.3.3)                   
##  devtools       1.12.0     2016-06-24 CRAN (R 3.3.0)                   
##  digest         0.6.12     2017-01-27 CRAN (R 3.3.2)                   
##  dplyr        * 0.5.0      2016-06-24 CRAN (R 3.3.0)                   
##  evaluate       0.10       2016-10-11 CRAN (R 3.3.0)                   
##  fiftystater  * 1.0.1      2016-11-14 CRAN (R 3.3.2)                   
##  forcats      * 0.2.0      2017-01-23 CRAN (R 3.3.2)                   
##  foreign        0.8-67     2016-09-13 CRAN (R 3.3.3)                   
##  gapminder    * 0.2.0      2015-12-31 CRAN (R 3.3.0)                   
##  geosphere      1.5-5      2016-06-15 CRAN (R 3.3.0)                   
##  gganimate    * 0.1        2016-11-11 Github (dgrtwo/gganimate@26ec501)
##  ggmap        * 2.7        2016-12-07 Github (dkahle/ggmap@c6b7579)    
##  ggplot2      * 2.2.1      2016-12-30 CRAN (R 3.3.2)                   
##  gtable         0.2.0      2016-02-26 CRAN (R 3.3.0)                   
##  haven          1.0.0      2016-09-23 cran (@1.0.0)                    
##  hms            0.3        2016-11-22 CRAN (R 3.3.2)                   
##  htmltools      0.3.6      2017-04-28 cran (@0.3.6)                    
##  htmlwidgets    0.8        2016-11-09 CRAN (R 3.3.1)                   
##  httr           1.2.1      2016-07-03 CRAN (R 3.3.0)                   
##  jpeg           0.1-8      2014-01-23 cran (@0.1-8)                    
##  jsonlite       1.4        2017-04-08 cran (@1.4)                      
##  knitr        * 1.15.1     2016-11-22 cran (@1.15.1)                   
##  lattice        0.20-34    2016-09-06 CRAN (R 3.3.3)                   
##  lazyeval       0.2.0      2016-06-12 CRAN (R 3.3.0)                   
##  lubridate      1.6.0      2016-09-13 CRAN (R 3.3.0)                   
##  magrittr       1.5        2014-11-22 CRAN (R 3.3.0)                   
##  mapproj        1.2-4      2015-08-03 CRAN (R 3.3.0)                   
##  maps         * 3.1.1      2016-07-27 CRAN (R 3.3.0)                   
##  maptools     * 0.9-2      2017-03-25 CRAN (R 3.3.2)                   
##  memoise        1.0.0      2016-01-29 CRAN (R 3.3.0)                   
##  mnormt         1.5-5      2016-10-15 CRAN (R 3.3.0)                   
##  modelr       * 0.1.0      2016-08-31 CRAN (R 3.3.0)                   
##  munsell        0.4.3      2016-02-13 CRAN (R 3.3.0)                   
##  nlme           3.1-131    2017-02-06 CRAN (R 3.3.3)                   
##  nycflights13 * 0.2.2      2017-01-27 CRAN (R 3.3.2)                   
##  plotly       * 4.6.0      2017-04-25 CRAN (R 3.3.3)                   
##  plyr           1.8.4      2016-06-08 CRAN (R 3.3.0)                   
##  png            0.1-7      2013-12-03 cran (@0.1-7)                    
##  proto          1.0.0      2016-10-29 CRAN (R 3.3.0)                   
##  psych          1.7.3.21   2017-03-22 CRAN (R 3.3.2)                   
##  purrr        * 0.2.2      2016-06-18 CRAN (R 3.3.0)                   
##  R6             2.2.0      2016-10-05 CRAN (R 3.3.0)                   
##  Rcpp           0.12.10    2017-03-19 cran (@0.12.10)                  
##  readr        * 1.1.0      2017-03-22 cran (@1.1.0)                    
##  readxl         0.1.1      2016-03-28 CRAN (R 3.3.0)                   
##  reshape2       1.4.2      2016-10-22 CRAN (R 3.3.0)                   
##  rgdal        * 1.2-7      2017-04-25 CRAN (R 3.3.2)                   
##  rgeos        * 0.3-23     2017-04-06 CRAN (R 3.3.2)                   
##  RgoogleMaps    1.4.1      2016-09-18 cran (@1.4.1)                    
##  rjson          0.2.15     2014-11-03 cran (@0.2.15)                   
##  rlang          0.0.0.9018 2017-05-01 Github (hadley/rlang@460323e)    
##  rmarkdown      1.3        2016-12-21 CRAN (R 3.3.2)                   
##  rprojroot      1.2        2017-01-16 CRAN (R 3.3.2)                   
##  rvest          0.3.2      2016-06-17 CRAN (R 3.3.0)                   
##  scales         0.4.1      2016-11-09 CRAN (R 3.3.1)                   
##  sp           * 1.2-4      2016-12-22 CRAN (R 3.3.2)                   
##  stringi        1.1.2      2016-10-01 CRAN (R 3.3.0)                   
##  stringr      * 1.2.0      2017-02-18 CRAN (R 3.3.2)                   
##  tibble       * 1.3.0.9001 2017-05-01 Github (tidyverse/tibble@08af6b0)
##  tidyr        * 0.6.1      2017-01-10 CRAN (R 3.3.2)                   
##  tidyverse    * 1.1.1      2017-01-27 CRAN (R 3.3.2)                   
##  viridisLite    0.2.0      2017-03-24 cran (@0.2.0)                    
##  withr          1.0.2      2016-06-20 CRAN (R 3.3.0)                   
##  xml2           1.1.1      2017-01-24 CRAN (R 3.3.2)                   
##  yaml           2.1.14     2016-11-12 cran (@2.1.14)

  1. See chapter 10 in The Truthful Art for a more detailed introduction to these features.

  2. Source: Wikipedia.

  3. Or as we’ll see shortly, apply a map projection to the graph.

  4. Originally obtained from the Census Bureau.

  5. This function replaces coord_fixed().

  6. Check out chapter 6.6.2 in ggplot2: Elegant Graphics for Data Analysis for a much more thorough explanation of the theory behind this selection process